Goto

Collaborating Authors

 safety metric


Agentic Reinforcement Learning for Search is Unsafe

arXiv.org Artificial Intelligence

Agentic reinforcement learning (RL) trains large language models to autonomously call tools during reasoning, with search as the most common application. These models excel at multi-step reasoning tasks, but their safety properties are not well understood. In this study, we show that RL-trained search models inherit refusal from instruction tuning and often deflect harmful requests by turning them into safe queries. However, this safety is fragile. Two simple attacks, one that forces the model to begin response with search (Search attack), another that encourages models to repeatedly search (Multi-search attack), trigger cascades of harmful searches and answers. The attacks succeed by triggering models to generate harmful, request-mirroring search queries before they can generate the inherited refusal tokens. This exposes a core weakness of current RL training: it rewards continued generation of effective queries without accounting for their harmfulness. As a result, RL search models have vulnerabilities that users can easily exploit, making it urgent to develop safety-aware agentic RL pipelines optimising for safe search. Instruction tuning (IT) is the standard method to align large language models (LLMs) with human preferences and teach them to refuse harmful requests (Schulman et al., 2017; Shao et al., 2024). However, IT only shapes static responses and is insufficient in agentic settings, where models must also decide when and how to call external tools, capabilities not explicitly learned during pre-training (Zhang et al., 2025). Agentic reinforcement learning (RL) for tool-use (Zhang et al., 2025) tackles this by fine-tuning models to interleave reasoning with tool use (Dong et al., 2025). In practice, search is the most common tool: agentic RL rewards effective, well-timed search queries and achieves strong gains on multi-hop reasoning tasks (Song et al., 2025a;b; Jin et al., 2025). Despite the progress, effect of agentic RL on safety of IT models remains unclear. While prior work reported safety degradation of retrieval-augmented agents (Y u et al., 2025), little is known about whether agentic RL for search preserves refusal of harmful requests. As agentic RL is now deployed in closed-source systems such as OpenAI's DeepSearch (OpenAI, 2025), this evaluation gap can create real deployment risks.


Safety Metric Aware Trajectory Repairing for Automated Driving

arXiv.org Artificial Intelligence

Recent analyses highlight challenges in autonomous vehicle technologies, particularly failures in decision-making under dynamic or emergency conditions. Traditional automated driving systems recalculate the entire trajectory in a changing environment. Instead, a novel approach retains valid trajectory segments, minimizing the need for complete replanning and reducing changes to the original plan. This work introduces a trajectory repairing framework that calculates a feasible evasive trajectory while computing the Feasible Time-to-React (F-TTR), balancing the maintenance of the original plan with safety assurance. The framework employs a binary search algorithm to iteratively create repaired trajectories, guaranteeing both the safety and feasibility of the trajectory repairing result. In contrast to earlier approaches that separated the calculation of safety metrics from trajectory repairing, which resulted in unsuccessful plans for evasive maneuvers, our work has the anytime capability to provide both a Feasible Time-to-React and an evasive trajectory for further execution.


Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

arXiv.org Artificial Intelligence

Safety alignment is the key to guiding the behaviors of large language models (LLMs) that are in line with human preferences and restrict harmful behaviors at inference time, but recent studies show that it can be easily compromised by finetuning with only a few adversarially designed training examples. We aim to measure the risks in finetuning LLMs through navigating the LLM safety landscape. We discover a new phenomenon observed universally in the model parameter space of popular open-source LLMs, termed as "safety basin": randomly perturbing model weights maintains the safety level of the original aligned model in its local neighborhood. Our discovery inspires us to propose the new VISAGE safety metric that measures the safety in LLM finetuning by probing its safety landscape. Visualizing the safety landscape of the aligned model enables us to understand how finetuning compromises safety by dragging the model away from the safety basin. LLM safety landscape also highlights the system prompt's critical role in protecting a model, and that such protection transfers to its perturbed variants within the safety basin. These observations from our safety landscape research provide new insights for future work on LLM safety community.


Evaluation of automated driving system safety metrics with logged vehicle trajectory data

arXiv.org Artificial Intelligence

Real-time safety metrics are important for the automated driving system (ADS) to assess the risk of driving situations and to assist the decision-making. Although a number of real-time safety metrics have been proposed in the literature, systematic performance evaluation of these safety metrics has been lacking. As different behavioral assumptions are adopted in different safety metrics, it is difficult to compare the safety metrics and evaluate their performance. To overcome this challenge, in this study, we propose an evaluation framework utilizing logged vehicle trajectory data, in that vehicle trajectories for both subject vehicle (SV) and background vehicles (BVs) are obtained and the prediction errors caused by behavioral assumptions can be eliminated. Specifically, we examine whether the SV is in a collision unavoidable situation at each moment, given all near-future trajectories of BVs. In this way, we level the ground for a fair comparison of different safety metrics, as a good safety metric should always alarm in advance to the collision unavoidable moment. When trajectory data from a large number of trips are available, we can systematically evaluate and compare different metrics' statistical performance. In the case study, three representative real-time safety metrics, including the time-to-collision (TTC), the PEGASUS Criticality Metric (PCM), and the Model Predictive Instantaneous Safety Metric (MPrISM), are evaluated using a large-scale simulated trajectory dataset. The proposed evaluation framework is important for researchers, practitioners, and regulators to characterize different metrics, and to select appropriate metrics for different applications. Moreover, by conducting failure analysis on moments when a safety metric failed, we can identify its potential weaknesses which are valuable for its potential refinements and improvements.


DIT4BEARs Smart Roads Internship

arXiv.org Artificial Intelligence

The research internship at UiT - The Arctic University of Norway was offered for our team being the winner of the 'Smart Roads - Winter Road Maintenance 2021' Hackathon. The internship commenced on 3 May 2021 and ended on 21 May 2021 with meetings happening twice each week. In spite of having different nationalities and educational backgrounds, we both interns tried to collaborate as a team as much as possible. The most alluring part was working on this project made us realize the critical conditions faced by the arctic people, where it was hard to gain such a unique experience from our residence. We developed and implemented several deep learning models to classify the states (dry, moist, wet, icy, snowy, slushy). Depending upon the best model, the weather forecast app will predict the state taking the Ta, Tsurf, Height, Speed, Water, etc. into consideration. The crucial part was to define a safety metric which is the product of the accident rates based on friction and the accident rates based on states. We developed a regressor that will predict the safety metric depending upon the state obtained from the classifier and the friction obtained from the sensor data. A pathfinding algorithm has been designed using the sensor data, open street map data, weather data.


Improving the Safety of 3D Object Detectors in Autonomous Driving using IoGT and Distance Measures

arXiv.org Artificial Intelligence

State-of-the-art object detectors are commonly evaluated based on accuracy metrics such as mean Average Precision (mAP). In this paper, inspired by the fact that mAP is not a direct safety indicator, we propose a straightforward safety metric, especially for 3D object detectors in Autonomous Driving contexts, by combining the Intersection-over-Ground-Truth (IoGT) measure and a distance ratio. Subsequently, we formulate a safety-aware loss function by amending IoGT to commonly used accuracy-oriented loss functions. Our experiments using models from the MMDetection3D library, the nuScenes dataset, and an in-house simulation dataset demonstrate that the object detector trained with our loss function significantly reduces unsafe predictions while staying performant on accuracy and maintaining good stability in the learning process.


Exploring the Design of Adaptation Protocols for Improved Generalization and Machine Learning Safety

arXiv.org Artificial Intelligence

While directly While directly fine-tuning (FT) large-scale, pretrained fine-tuning (FT) such models on task-specific data models on task-specific data is wellknown is known to improve in-distribution (ID) task performance to induce strong in-distribution task performance, (Neyshabur et al., 2020; Zhuang et al., 2019; Chen et al., recent works have demonstrated that different 2020), recent work finds FT does not effectively leverage the adaptation protocols, such as linear probing expressiveness of large-scale, pretrained representations and (LP) prior to FT, can improve out-of-distribution fails to match the out-of-distribution (OOD) performance of generalization. However, the design space of such other adaptation protocols, such as the LP + FT protocol adaptation protocols remains under-explored and which performs linear probing (LP) prior to FT (Kumar the evaluation of such protocols has primarily focused et al., 2022). Concurrently, Kirichenko et al. (2022) find on distribution shifts. Therefore, in this that simply retraining the last (classifier) layer with a small work, we evaluate common adaptation protocols amount of "re-weighting" or minority group data, can safeguard across distributions shifts and machine learning against spurious correlations. Crucially, both works safety metrics (e.g., anomaly detection, calibration, suggest that well-designed adaptation protocols can improve robustness to corruptions). We find that protocols both ID task performance and robustness.


Network-level Safety Metrics for Overall Traffic Safety Assessment: A Case Study

arXiv.org Artificial Intelligence

Driving safety analysis has recently witnessed unprecedented results due to advances in computation frameworks, connected vehicle technology, new generation sensors, and artificial intelligence (AI). Particularly, the recent advances performance of deep learning (DL) methods realized higher levels of safety for autonomous vehicles and empowered volume imagery processing for driving safety analysis. An important application of DL methods is extracting driving safety metrics from traffic imagery. However, the majority of current methods use safety metrics for micro-scale analysis of individual crash incidents or near-crash events, which does not provide insightful guidelines for the overall network-level traffic management. On the other hand, large-scale safety assessment efforts mainly emphasize spatial and temporal distributions of crashes, while not always revealing the safety violations that cause crashes. To bridge these two perspectives, we define a new set of network-level safety metrics for the overall safety assessment of traffic flow by processing imagery taken by roadside infrastructure sensors. An integrative analysis of the safety metrics and crash data reveals the insightful temporal and spatial correlation between the representative network-level safety metrics and the crash frequency. The analysis is performed using two video cameras in the state of Arizona along with a 5-year crash report obtained from the Arizona Department of Transportation. The results confirm that network-level safety metrics can be used by the traffic management teams to equip traffic monitoring systems with advanced AI-based risk analysis, and timely traffic flow control decisions.


Autonomous vehicles need a large-systems approach to safety

#artificialintelligence

The six modules in the MSS are split between lagging and leading measures. Lagging measures track only outcomes, such as a crash, once it has already occurred. Conversely, leading measures are proactive indicators that measure prevention efforts and can be observed and evaluated prior to a crash occurring, providing foresight to the technology's performance prior to deployment. By encompassing both types of measures, the MSS intends to produce an output that gives a comprehensive view of AV safety. Much like the modules themselves, the MSS will compete in the marketplace of safety systems. Federal, state and local regulators will select approaches from this marketplace to adopt, iterate and develop. This open marketplace will drive greater transparency in safety data and greater substantive safety for pedestrians and passengers alike. Autonomous technology is expected to drastically improve the safety, sustainability, and mobility of our transportation systems. Acknowledging that creating a cohesive and inclusive approach to safety is the key to accelerating AV development, the large-systems approach offers a new way of thinking about AV safety.


Yandex claims 2 million self-driving car miles, double in 4 months

#artificialintelligence

Yandex claims that its autonomous cars have driven 2 million miles to date, double the figure it reported in October and 4 times the number announced in August. The Russian tech titan revealed the latest milestone in its Q4 2019 financials earlier today. Additionally, Yandex shared for the first time that it has invested $35 million in its self-driving program since its inception, $24 million last year and $9 million in the fourth quarter alone. By way of a brief recap, Yandex's on-demand transport subsidiary, Yandex.Taxi, unveiled its self-driving car program back in May 2017, shortly before it began piloting the vehicles on Moscow roads. In the intervening years, the company has expanded its fleet across Russia and Israel (Tel Aviv) and to the U.S. (Las Vegas). It also laid claim to being the first public autonomous ride-hailing service when it launched a pilot in the Russian town of Innopolis back in 2018.